-
Notifications
You must be signed in to change notification settings - Fork 1k
Add custom interpreter for RayJob #6947
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add custom interpreter for RayJob #6947
Conversation
Summary of ChangesHello @owenowenisme, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces a custom interpreter for Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds a custom resource interpreter for RayJob (ray.io/v1) to enable Karmada to understand and manage Ray distributed computing jobs across multiple clusters. The implementation provides component resource extraction, health checking, and status aggregation capabilities for RayJob workloads.
Key changes:
- Custom interpreter implementation with Lua scripts for component, health, and status operations
- Test data files demonstrating RayJob structure and multi-cluster scenarios
- Test configuration to validate the interpreter operations
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| customizations.yaml | Implements Lua scripts for component extraction, health interpretation, and status aggregation for RayJob resources |
| customizations_tests.yaml | Defines test cases for the three interpreter operations |
| observed-rayjob.yaml | Test data representing an observed RayJob instance with status |
| desired-rayjob.yaml | Test data representing the desired RayJob specification |
| status-file.yaml | Test data with status information from multiple clusters for aggregation testing |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
...ceinterpreter/default/thirdparty/resourcecustomizations/ray.io/v1/RayJob/customizations.yaml
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a custom resource interpreter for RayJob resources, including Lua scripts for component discovery, health assessment, and status aggregation. The overall implementation is solid and well-tested. I have two main suggestions for improvement. First, the health interpretation logic can be simplified for better readability and conciseness. Second, and more critically, the status aggregation for rayClusterStatus should sum up resource metrics from all member clusters instead of just taking the values from the first one, which would provide a more accurate representation of the total resource state. I've provided specific suggestions for these changes.
...ceinterpreter/default/thirdparty/resourcecustomizations/ray.io/v1/RayJob/customizations.yaml
Outdated
Show resolved
Hide resolved
...ceinterpreter/default/thirdparty/resourcecustomizations/ray.io/v1/RayJob/customizations.yaml
Show resolved
Hide resolved
|
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #6947 +/- ##
==========================================
+ Coverage 46.45% 46.51% +0.06%
==========================================
Files 698 698
Lines 47809 47876 +67
==========================================
+ Hits 22208 22270 +62
- Misses 23930 23933 +3
- Partials 1671 1673 +2
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
4a9fa77 to
0230887
Compare
| jobStatus = currentStatus.jobStatus | ||
| end | ||
|
|
||
| -- Take first non-nil identifiers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these values collected below not very valuable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now we only left the jobDeploymentStatus jobStatus message and reason
Other info can be inferred from RayCluster
| failed = failed + currentStatus.failed | ||
| end | ||
|
|
||
| -- Take minimum observedGeneration (most conservative) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The collected observedGeneration seems to have no value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed
XiShanYongYe-Chang
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to ask whether it is necessary to process the current resources regarding the dependencyInterpretation hook point.
Unlike RayCluster, RayJob may need resources like ConfigMap to run job script or something else, so I added Output: karmadactl interpret -f customizations.yaml --operation InterpretDependency --desired-file testdata/desired-rayjob-with-dependen
cies.yaml
---
# [1/1] dependencies:
- apiVersion: v1
kind: ConfigMap
name: app-config
namespace: default
- apiVersion: v1
kind: ConfigMap
name: db-config
namespace: default
- apiVersion: v1
kind: ConfigMap
name: init-config
namespace: default
- apiVersion: v1
kind: ConfigMap
name: projected-config
namespace: default
- apiVersion: v1
kind: ConfigMap
name: ray-env-config
namespace: default
- apiVersion: v1
kind: ConfigMap
name: ray-job-code-sample
namespace: default
- apiVersion: v1
kind: ConfigMap
name: submitter-config
namespace: default
- apiVersion: v1
kind: PersistentVolumeClaim
name: worker-cache-pvc
namespace: default
- apiVersion: v1
kind: Secret
name: api-credentials
namespace: default
- apiVersion: v1
kind: Secret
name: projected-secret
namespace: default
- apiVersion: v1
kind: Secret
name: ray-env-secrets
namespace: default
- apiVersion: v1
kind: Secret
name: registry-secret
namespace: default
- apiVersion: v1
kind: Secret
name: submitter-secrets
namespace: default
- apiVersion: v1
kind: Secret
name: tls-certs
namespace: default
- apiVersion: v1
kind: Secret
name: worker-data-secret
namespace: default
- apiVersion: v1
kind: ServiceAccount
name: ray-head-sa
namespace: default |
e848a62 to
b7b0a84
Compare
|
Here is a lint error, can you help fix it? Other parts LGTM, can you help squash all the commits into one? Then we can merge it. |
b7b0a84 to
a5c3051
Compare
Signed-off-by: You-Cheng Lin (Owen) <[email protected]> update Signed-off-by: You-Cheng Lin (Owen) <[email protected]>
a5c3051 to
fc86224
Compare
XiShanYongYe-Chang
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks~
/lgtm
/approve
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: XiShanYongYe-Chang The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
What type of PR is this?
/kind feature
What this PR does / why we need it:
Which issue(s) this PR fixes:
Part of #6588
Special notes for your reviewer:
1. InterpretComponent
Command:
Result:
2. InterpretHealth
Command:
Result:
3. AggregateStatus (Multi-RayJob)
Command:
Does this PR introduce a user-facing change?: